REPORT  DOCUMENTATION  PAGE 


Form  Approved  OMB  NO.  0704-0188 


The  public  reporting  burden  for  this  coilection  of  information  is  estimated  to  average  1  hour  per  response,  inciuding  the  time  for  reviewing  instructions, 
searching  existing  data  sources,  gathering  and  maintaining  the  data  needed,  and  compieting  and  reviewing  the  coiiection  of  information.  Send  comments 
regarding  this  burden  estimate  or  any  other  aspect  of  this  coilection  of  information,  including  suggesstions  for  reducing  this  burden,  to  Washington 
Headquarters  Services,  Directorate  for  information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Ariington  VA,  22202-4302. 
Respondents  shouid  be  aware  that  notwithstanding  any  other  provision  of  iaw,  no  person  shaii  be  subject  to  any  oenaity  for  failing  to  comply  with  a  coiiection 
of  information  if  it  does  not  dispiay  a  currentiy  vaiid  OMB  controi  number. 

PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ADDRESS. 


1.  REPORT  DATE  (DD-MM-YYYY) 


2.  REPORT  TYPE 
New  Reprint 


4.  TITLE  AND  SUBTITLE 

A  Lossy  Souree  Coding  Interpretation  of  Wyner’s  Common 
Information 


3.  DATES  COVERED  (From  -  To) 


5a.  CONTRACT  NUMBER 
W9IINF-I2-I-0383 


5b.  GRANT  NUMBER 


6.  AUTHORS 
Wei  Liu,  Biao  Chen,  Ge  Xu 


5e.  PROGRAM  ELEMENT  NUMBER 
611102 


5d.  PROJECT  NUMBER 


5e.  TASK  NUMBER 


5f.  WORK  UNIT  NUMBER 


7.  PERFORMING  ORGANIZATION  NAMES  AND  ADDRESSES 

Syraeuse  University 
Offiee  of  Researeh 
1 13  Bowne  Hall 

Syraeuse,  NY  13244  -1200 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS 
(ES) 

U.S.  Army  Researeh  Offiee 
P.O.Box  12211 

Researeh  Triangle  Park,  NC  27709-2211 


8.  PERFORMING  ORGANIZATION  REPORT 
NUMBER 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 
ARO 


11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

6I953-CS.I3 


12.  DISTRIBUTION  AVAILIBILITY  STATEMENT 
Approved  for  publie  release;  distribution  is  unlimited. 


13.  SUPPLEMENTARY  NOTES 

The  views,  opinions  and/or  findings  eontained  in  this  report  are  those  of  the  author(s)  and  should  not  contrued  as  an  offieial  Department 
of  the  Army  position,  policy  or  decision,  unless  so  designated  by  other  documentation. 


14.  ABSTRACT 

Wyner’s  common  information  was  originally  defined  for  a  pair  of  dependent  discrete  random  variables.  Its 
significance  is  largely  reflected  in,  and  also  confined  to,  several  existing  interpretations  in  various  source  coding 
problems.  This  paper  attempts  to  expand  its  practical  significance  by  providing  a  new  operational  interpretation.  In 
the  context  of  the  Gray-Wyner  network,  it  is  established  that  Wyner’s  common  information  has  a  new  lossy  source 
coding  interpretation.  Specifically,  it  is  established  that,  under  suitable  conditions,  Wyner’s  common  information 

y-v  y-t  1  1  y*  1  J-1  4-  y-V  y-v  i-1  1  1  y-V  i-1  4"  y-fc  y-V  y-V  y-V  i-1  i-1  y-V  4"  y-V  -W -W  x'U  y-V  4-!'%  y-V  4"  y-V  4"  yV  1  *«y-v4-y-V  ^  ««1y^  ^  4--^  1  T  r  y-fc  1  y-V  i-1  y-V  4"  y-V  4-\^  y-V  *«y-*4-y-V  ^  -9  J-1  4-  y-V  .«  y-V  -G  1  y-^4-^  y-V 


15.  SUBJECT  TERMS 

Common  information,  Gray-Wyner  network,  rate  distortion  function 


4-y-V  4-T'%  y-v  *«y-*4-y-v  y-1  i-1  4- y-v  .« 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

15.  NUMBER 

a.  REPORT 

b.  ABSTRACT 

c.  THIS  PAGE 

ABSTRACT 

OF  PAGES 

UU 

UU 

UU 

UU 

Biao  Chen 


19b.  TELEPHONE  NUMBER 
315-443-3332 


Standard  Form  298  (Rev  8/98) 
Prescribed  by  ANSI  Std.  Z39. 18 


Report  Title 

A  Lossy  Source  Coding  Interpretation  of  Wyner’s  Common  Information 

ABSTRACT 

Wyner’s  common  information  was  originally  defined  for  a  pair  of  dependent  diserete  random  variables.  Its 
signifieanee  is  largely  refieeted  in,  and  also  confined  to,  several  existing  interpretations  in  various  source  eoding 
problems.  This  paper  attempts  to  expand  its  praetical  significance  by  providing  a  new  operational  interpretation.  In 
the  context  of  the  Gray-Wyner  network,  it  is  established  that  Wyner’s  eommon  information  has  a  new  lossy  source 
eoding  interpretation.  Specifieally,  it  is  established  that,  under  suitable  conditions,  Wyner’s  eommon  information 
equals  to  the  smallest  eommon  message  rate  when  the  total  rate  is  arbitrarily  close  to  the  rate  distortion  funetion  with 
joint  deeoding  for  the  Gray-Wyner  network.  A  surprising 

observation  is  that  such  equality  holds  independent  of  the  values  of  distortion  constraints  as  long  as  the  distortions  are 
within  some  distortion  region.  The  new  lossy  souree  coding  interpretation  provides  the  first  meaningful  justifieation 
for  defining  Wyner’s  common  information  for  continuous  random  variables  and  the  result  ean  also  be  extended  to 
that  of  multiple  variables.  Examples  are  given  for  characterizing  the  rate  distortion  region  for  the  Gray-Wyner  lossy 
source  coding  problem  and  for  identifying  eonditions  under  which  Wyner’s  eommon  information  equals  that  of  the 
smallest  eommon  rate.  As  a  by-product,  the  explieit 

expression  for  the  eommon  information  between  a  pair  of  Gaussian  random  variables  is  obtained. 


REPORT  DOCUMENTATION  PAGE  (SF298) 
(Continuation  Sheet) 

Continuation  for  Block  13 


ARC  Report  Number  61953. 13-CS 
A  Lossy  Source  Coding  Interpretation  of  Wyner’... 


Block  13:  Supplementary  Note 

©2016  .  Published  in  IEEE  Transactions  on  Information  Theory,  Vol.  62  (2)  (2016),  (2  (2).  DoD  Components  reserve  a  royalty- 
free,  nonexclusive  and  irrevocable  right  to  reproduce,  publish,  or  otherwise  use  the  work  for  Federal  purposes,  and  to  authroize 
others  to  do  so  (DODGARS  §32.36).  The  views,  opinions  and/or  findings  contained  in  this  report  are  those  of  the  author(s)  and 
should  not  be  construed  as  an  official  Department  of  the  Army  position,  policy  or  decision,  unless  so  designated  by  other 
documentation. 


Approved  for  public  release;  distribution  is  unlimited. 


754 


IEEE  TRANSACTIONS  ON  INFORMATION  THEORY,  VOL.  62,  NO.  2,  FEBRUARY  2016 


A  Lossy  Source  Coding  Interpretation 
of  Wyner’s  Common  Information 

Ge  Xu,  Wei  Liu,  and  Biao  Chen,  Fellow,  IEEE 


Abstract — Wyner’s  common  information  was  originally 
defined  for  a  pair  of  dependent  discrete  random  variables. 
Its  significance  is  largely  reflected  in,  and  also  confined  to,  several 
existing  Interpretations  in  various  source  coding  problems.  This 
paper  attempts  to  expand  its  practical  significance  by  providing  a 
new  operational  interpretation.  In  the  context  of  the  Gray-Wyner 
network,  it  is  established  that  Wyner’s  common  information 
has  a  new  lossy  source  coding  interpretation.  Specifically,  it 
is  established  that,  under  suitable  conditions,  Wyner’s  common 
information  equals  to  the  smallest  commou  message  rate  when 
the  total  rate  is  arbitrarily  close  to  the  rate  distortion  function 
with  joint  decoding  for  the  Gray-Wyner  network.  A  surprising 
observation  is  that  such  equality  holds  independeut  of  the  values 
of  distortion  constraints  as  long  as  the  distortions  are  within  some 
distortion  region.  The  new  lossy  source  coding  interpretation 
provides  the  first  meaningful  justification  for  defining  Wyner’s 
common  information  for  continuous  random  variables  and  the 
result  can  also  be  extended  to  that  of  multiple  variables.  Examples 
are  given  for  characterizing  the  rate  distortion  region  for  the 
Gray-Wyner  lossy  source  coding  problem  and  for  identifying 
conditions  under  which  Wyner’s  common  information  equals 
that  of  the  smallest  common  rate.  As  a  by-product,  the  explicit 
expression  for  the  common  information  between  a  pair  of 
Gaussian  random  variables  is  obtained. 

Index  Terms — Common  information,  Gray-Wyner  network, 
rate  distortion  function. 

I.  Introduction 

CONSIDER  A  pair  of  dependent  random  variables 
X  and  Y  with  joint  distribution  p{x,y),  which  denotes 
either  the  probability  density  function  if  X  and  Y  are  contin¬ 
uous  or  the  probability  mass  function  if  X  and  Y  are  discrete. 
Quantifying  the  information  that  is  common  between  X  and  Y 
has  been  a  classical  problem  both  in  information  theory  and  in 
mathematical  statistics  [l]-[4].  The  most  widely  used  notion 

Manuscript  received  January  8,  2013;  revised  February  9,  2015;  accepted 
September  30,  2015.  Date  of  publication  December  8,  2015;  date  of  current 
version  January  18,  2016.  This  work  was  supported  in  part  by  the  U.S.  Army 
Research  Office  under  Award  W91  lNF-12-1-0383,  in  part  by  the  Air  Force 
Office  of  Scientific  Research  under  Award  FA9550-10-1-0458,  and  in  part 
by  the  National  Science  Foundation  under  Award  1218289.  This  paper 
was  presented  at  the  2010  Annual  Allerton  Conference  on  Communication, 
Control,  and  Computing  and  the  2011  Annual  Conference  on  Information 
Sciences  and  Systems. 

G.  Xu  was  with  the  Department  of  Electrical  Engineering  and  Computer 
Science,  Syracuse  University,  Syracuse,  NY  13244  USA.  She  is  now 
with  Nuance  Communications,  Burlington,  MA  01803  USA  (e-mail: 
ge.xu@nuance.com). 

W.  Liu  was  with  the  Department  of  Electrical  Engineering  and  Computer 
Science,  Syracuse  University,  Syracuse,  NY  13244  USA.  He  is  now  with 
Bloomberg  L.R,  New  York,  NY  10022  USA  (e-mail:  wliusyr@gmail.com). 

B.  Chen  is  with  the  Department  of  Electrical  Engineering  and 
Computer  Science,  Syracuse  University,  Syracuse,  NY  13244  USA  (e-mail: 
bichen  @  syr.edu) . 

Communicated  by  Y.  Oohama,  Associate  Editor  for  Source  Coding. 

Digital  Object  Identifier  10.1 109/TIT.2015.2506560 


is  Shannon’s  mutual  information,  defined  as 


I{X\  Y)  =  E 


p{x,y) 

p{x)p{y)_ 


where  p{x)  and  p{y)  are  the  marginal  distributions  of 
X  and  Y  corresponding  to  the  joint  distribution  p(x,y)  and 
£[■]  denotes  expectation  with  respect  to  p(x,y).  Shannon’s 
mutual  information  measures  the  amount  of  uncertainty  reduc¬ 
tion  in  one  variable  by  observing  the  other.  Its  significance 
lies  in  its  applications  to  a  broad  range  of  problems  In  which 
concrete  operational  meanings  of  /(X;  Y)  can  be  established. 
These  include  both  source  and  channel  coding  problems  in 
information  and  communication  theory  [5]  and  hypothesis 
testing  problems  in  statistical  inference  [6]. 

Other  notions  of  information  have  also  been  defined 
between  a  pair  of  dependent  variables.  Most  notable 
among  them  are  Gacs  and  Kdrner’s  common  randomness 
K{X,  T)  [2]  and  Wyner’s  common  information  C(X,  Y)  [4]. 
Gacs  and  Kdrner’s  common  randomness  is  defined  as  the 
maximum  number  of  common  bits  per  symbol  that  can 
be  independently  extracted  from  X  and  Y.  Quite  naturally, 
K(X,  Y)  has  found  extensive  applications  in  secure  commu¬ 
nications,  e.g.,  for  key  generation  [7]-[9].  More  recently,  a  new 
interpretation  of  K  {X,  Y)  using  the  Gray-Wyner  source  coding 
network  was  given  in  [10].  It  was  noted  in  [2]  and  [11]  that 
the  definition  of  K  (X,  Y)  is  rather  restrictive  in  that  K  (X,  Y) 
equals  0  in  most  cases  except  for  the  special  case  when 
X  =  (X',  V)  and  Y  =  (Y',  V)  and  X' ,  Y' ,  V  are  independent 
variables  or  those  (X,  Y)  pair  that  can  be  converted  to  such 
a  dependence  structure  through  relabeling  the  realizations, 
i.e.,  permutation  of  the  joint  distribution  matrix.  Notice  also 
that  K  {X,  Y)  is  defined  only  for  discrete  random  variables. 

Wyner’s  common  information  was  originally  defined  for  a 
pair  of  discrete  random  variables  with  finite  alphabet  as 


C(X,  T)  =  ^inf  ^/(X,  T;  W).  (1) 

Here,  the  infimum  is  taken  over  all  auxiliary  random  vari¬ 
ables  W  such  that  X,  W,  and  Y  form  a  Markov  chain.  The 
operational  meanings  of  C(X,  Y)  available  in  existing  litera¬ 
ture  include  the  minimum  common  rate  for  the  Gray-Wyner 
lossless  source  coding  problem  under  a  sum  rate  constraint, 
the  minimum  rate  of  a  common  input  of  two  independent 
random  channels  for  distribution  approximation  [4],  and  strong 
coordination  capacity  of  a  two-node  network  without  common 
randomness  and  with  actions  assigned  at  one  node  [12]. 

While  Wyner  only  considered  the  definition  of  common 
information  for  discrete  random  variables,  the  expression 
specified  by  (1)  directly  applies  to  a  pair  of  continuous 
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random  variables.  However,  absent  of  any  meaningful 
interpretation,  the  computed  value  for  a  pair  of  continuous 
random  variables  is  largely  pointless.  Notice  that  Wyner’s 
original  interpretations  for  C  (Z,  Y)  are  only  applicable  to 
discrete  random  variables. 

This  paper  presents  a  new  lossy  source  coding  interpretation 
of  Wyner’s  common  information  using  the  Gray-Wyner  net¬ 
work.  Specifically,  we  show  that,  for  the  Gray-Wyner  network, 
Wyner’s  common  information  is  precisely  the  smallest  com¬ 
mon  message  rate  for  a  certain  range  of  distortion  constraints 
when  the  total  rate  is  arbitrarily  close  to  the  rate  distortion 
function  with  joint  decoding.  As  the  common  information  is 
only  a  function  of  the  joint  distribution,  this  smallest  common 
rate  remains  constant  even  if  the  distortion  constraints  vary, 
as  long  as  they  are  within  a  specific  distortion  region.  With 
this  new  interpretation,  concrete  practical  interpretation  is 
thus  associated  with  Wyner’s  common  information  defined 
for  a  pair  of  continuous  random  variables.  There  has  also 
been  recent  effort  in  characterizing  the  common  message  rate 
for  lossy  source  coding  using  the  Gray-Wyner  network  [16]. 
We  establish  the  equivalence  between  the  characterization 
in  [16]  with  an  alternative  characterization  presented  in  the 
present  paper. 

Computing  Wyner’s  common  information  is  known  to  be  a 
challenging  problem;  C(X,  Y)  was  only  resolved  for  several 
special  cases  as  described  in  [4]  and  [13].  Along  with  our  gen¬ 
eralizations  of  Wyner’s  common  information,  we  provide  two 
new  examples  where  we  can  explicitly  evaluate  the  common 
information  of  multiple  dependent  variables.  In  particular,  we 
derive,  through  an  estimation  theoretic  approach,  C  (X,  Y)  for 
a  bivariate  Gaussian  source  and  its  extension  to  the  multi¬ 
variate  case  with  a  certain  correlation  structure. 

The  rest  of  the  paper  is  organized  as  follows.  Section  II 
reviews  Wyner’s  two  approaches  for  the  common  information 
of  two  discrete  random  variables,  the  marginal,  and  conditional 
rate  distortion  functions  and  their  relationships  and  the  concept 
of  successively  refinable  sources.  In  Section  III,  we  provide 
a  new  interpretation  of  Wyner’s  common  information  using 
Gray- Wyner’s  lossy  source  coding  network.  Specifically,  we 
prove  that  for  the  Gray-Wyner  network,  Wyner’s  common 
information  is  precisely  the  smallest  common  message  rate 
for  a  certain  range  of  distortion  constraints  when  the  total  rate 
is  arbitrarily  close  to  the  rate  distortion  function  with  joint 
decoding.  In  Section  IV,  two  examples,  the  doubly  symmetric 
binary  source  and  the  bivariate  Gaussian  source,  are  used  to 
illustrate  the  lossy  source  coding  interpretation  of  Wyner’s 
common  information.  The  common  information  for  bivariate 
Gaussian  source  and  its  extension  to  the  multi-variate  case  is 
also  derived  in  IV.  Section  V  concludes  this  paper. 

Notation:  Throughout  this  paper,  we  use  calligraphic 
letter  X  to  denote  the  alphabet  and  p{x)  to  denote  either  point 
mass  function  or  probability  density  function  of  a  random 
variable  X. 

II.  Existing  Results 

A.  Wyner’s  Result 

Given  two  discrete  random  variables  Xi  and  X2  with 
distribution  p{xi,X2),  Wyner  defined  the  common  information 


Fig.  1.  Source  coding  over  a  simple  network. 

of  them  as  in  equation  (1)  and  provided  two  operational 
meanings  for  this  definition. 

The  first  approach  is  based  on  the  model  shown  in  Fig.  1 
which  is  a  source  coding  network  first  studied  by 
Gray  and  Wyner  in  [17].  In  this  model,  the  encoder  observes 
a  pair  of  sequences  (Z",  Z2).  There  are  2  receivers,  with  the 
ith  receiver  only  interested  in  recovering  the  sequence  Z", 
i  =  1,2.  The  encoder  encodes  the  source  into  3  messages 
Wo,  Wi,  W2,  one  is  a  public  message  available  at  all  receivers 
while  the  other  2  messages  are  private  messages  only  available 
at  the  corresponding  receivers.  For  m  =  1,2,  •••,  let 

/„,  =  [0,  1,  2,  •  •  •  ,  m  — 1}.  An  (n,  Mq,  Mi,  M2)  code  is  defined 
as  below. 

Definition  1:  An  {n,  Mq,  M\,  M2)  code  consists  of 

•  An  encoder  mapping 

/  :  X[‘  X  X2  ->  I Mo  X  X  I  M2, 

•  2  decoder  mappings 

gi  ■  I  Mi  X  I  Mo  ,  I  =  1,2. 

For  an  (n,Mo,Mi,M2)  code,  let  /(Z^Z^)  = 

(Wo,Wi,W2)  and  Z«  =  g/(W,Wo),  i  =  1,2.  The 

probability  of  error  can  be  obtained  by 

1  ^ 

(2) 

”  i=i 

where  dH{u",u'')  is  the  Hamming  distance  between 

m"  and  m". 

A  number  Rq  is  said  to  be  achievable  if  for  any  e  >  0, 
there  exists,  for  n  sufficiently  large,  an  (n,  Mq,  M\,  M2)  code 
such  that 

Mo  <  2"^o,  (3) 

1 

T-logM,-  <  H(Zi,Z2)  +  e,  (4) 

n 

/=o 

<  e.  (5) 

Wyner  defined  Ci  as  the  the  infimum  of  all  achievable  Rq 
and  proved  in  [4]  that 

Ci  =  C(Zi,Z2).  (6) 

The  second  approach  is  shown  in  Fig.  2.  In  this  approach, 
the  joint  distribution  p{x’l,xifi)  —  uu  p{xu,X2i)  is 
approximated  by  the  output  distribution  of  a  pair  of  random 
number  generators.  A  common  input  W,  uniformly  distributed 
on  W  =  [I,--  -  ,2"^°}  is  sent  to  two  separate  proces¬ 
sors  which  are  independent  of  each  other.  These  processors 
(random  number  generators)  generate  independent  and 
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Processor  1 

Processor  2 

^2" 

where  the  minimum  is  taken  over  all  p{x\\x\,  X2)  satisfying 
Ed{{X\,X\)  <  Di.  The  joint,  marginal  and  conditional  rate 
distortion  functions  satisfy  the  following  inequalities. 

Lemma  1  [18],  [19] :  Given  a  two-dimensional  source 
(Xi,X2)  with  joint  distribution  p{xi,X2)  and  two  distortion 
measures  di{xi,x\),  d2(x2,X2)  defined  respectively  on 
Xi  X  Xi  and  X2  x  X2,  the  rate  distortion  functions  satisfy 
the  following  inequalities 


Fig.  2.  Random  variables  generator. 


identically  distributed  (i.i.d)  sequences  according  to  two  dis¬ 
tributions  qi{x"\w)  and  ^2(2^2!“^)  respectively.  The  output 
sequences  of  the  two  processors  are  denoted  by  X"  and  X2 
respectively  and  the  joint  distribution  of  the  output  sequences 
is  given  by 

qix'{,x^)^  X  T^^‘llixi\w)q2ix2\w). 

weW  '  ' 


Let 


Dniq,  p) 


^  X  q(xi,x^)log 


q(xi,x") 

p(x",x”)' 


Let  C2  be  the  infimum  of  rate  Rq  for  the  common  input  such 
that  for  any  e  >  0,  there  exists  a  pair  of  distribtions  ^i(xj  |to), 
q2(x2lw)  and  n  such  that  L>„(q,  p)  <  e. 

It  is  proved  in  [4]  that 


C2^C(Xi,X2). 


(7) 


It  is  worth  noting  that  Wyner’s  common  information  can 
be  generalized  to  that  of  multiple  dependent  random  variables 
by  replacing  the  Markov  condition  with  a  conditional  indepen¬ 
dence  structure;  the  latter  is  equivalent  to  the  Markov  condition 
for  three  random  variables  yet  is  applicable  to  multiple  random 
variables.  As  established  in  [31],  such  a  generalization  is 
meaningful  in  that  both  interpretations  Ci  and  C2  carry  over 
to  the  case  involving  multiple  random  variables. 


B.  Joint,  Marginal  and  Conditional 
Rate  Distortion  Functions 

In  this  section,  we  review  the  joint,  marginal  and  conditional 
rate  distortion  functions  and  their  relations  which  will  be  used 
in  the  following  sections.  Two-dimensional  sources  will  be 
considered  and  the  results  can  be  generalized  immediately  to 
A-dimensional  vector  sources. 

Given  a  two-dimensional  source  (Xi,X2)  with  probability 
distribution  pixi,X2)  and  two  distortion  measures  d\(x\,x\) 
and  d2(x2,  ^2)  defined  on  Xi  x  Xi  and  X2  x  X2,  the  joint  rate 
distortion  function  is  given  by 

f?XiX2(T>i,T>2)  =  inf  l{XiX2-,XiX2),  (8) 

p{x\X2\xiX2) 

where  the  minimum  is  taken  over  all  p{xiX2\x\X2)  such  that 
EdiiXuXi)  <  Di,  Ed2{X2,X2)  <  D2. 

The  conditional  rate  distortion  function  is  defined  as  [20] 

Rxi\X2{Di)^  inf  l{Xi-Xi\X2).  (9) 

p{x\\xiX2) 


RxiX2{D\,D2)  >  Rxi\X2{Di)  +  Rx2iD2),  (10a) 

Rx,\X2iDi)  >  RxiiDi)  -  /(Ai;  X2),  (10b) 

RxiX2{Di,D2)  >  Rxi{Di)  +  Rxy{D2)-I{Xi-,X2), 

(10c) 

Rxi{E>i)  >  Rxi\X2{Di),  (11a) 

Rx^iDi)  +  Rx2{D2)  >  RxiX2{Du  D2).  (11b) 


A  sufficient  condition  for  equality  in  (10)  is  that  the 
optimum  backward  test  channels  for  the  functions  on 
the  left  side  of  each  equation  factor  appropriately,  i.e., 
for  (10a)  pb{xiX2\xiX2)  —  p{xi\xiX2)p{x2\x2),  for  (10b) 
Pb{x\\x\X2)  —  p{x\\xi)  and  for  (10c)  that  pb{x\X2\xiX2)  — 
/7(xi|xi)/2(x2|x2).  Equalities  hold  in  (11)  if  and  only  if 
Xi  and  X2  are  independent. 

It  is  shown  in  [18]  that  under  quite  general  condi¬ 
tions,  equalities  in  (10)  hold  for  small  values  of  distortion. 
To  state  this  result,  we  introduce  the  Extended  Shannon  Lower 
Bounds  (ESLB)  of  rate  distortion  functions  [18],  [20].  Let 
us  denote  f?[f'^(D)  the  ESLB  of  Rij{D).  R^^\d)  is  a  lower 
bound  of  Ru(D)  and  is  derived  by  removing  the  constraint 
that  the  minimizing  test  channel  pt{u\u)  is  nonnegative  in 
the  minimization  defining  RuiD).  The  detailed  definition  and 
calculation  of  ESLB  can  be  found  in  [20]. 

If  the  reproducing  alphabet  is  identical  to  the  source  alpha¬ 
bet,  the  marginal,  joint  and  conditional  rate  distortion  functions 
equal  to  their  corresponding  ESLB  for  small  distortion  regions, 
while  these  ESLB,  satisfy  the  equalities  in  (10).  The  detail  of 
this  result  is  given  in  Lemma  2  and  the  following  notations  are 
used.  D  denotes  a  surface  in  an  m -dimensional  space  and  the 
inequality  A  <D  means  that  there  exists  a  vector  fi  eD  such 
that  A  <  yS.  If  there  is  no  such  a  vector,  A  >  D.  Likewise, 
<  D2  means  that  P  <D2  for  any  p  eVi  [18]. 

Lemma  2  [18]:  Given  a  two-dimensional  source  (Ai,  A2) 
with  joint  distribution  p(xi,X2)  and  alphabet  Xi  x  X2,  repro¬ 
duction  alphabets  Xi  =  Xi,  X2  =  X2  and  two  per-letter 
distortion  measures  d\{x\,x\)  and  d2ix2,X2)  that  satisfy 

di{xi,xi)  >  di{xi,xi)  =  0,  Xi  /  Xi,  i  =  1,  2,  (12) 

there  exist  strictly  positive  surfaces  I?(AiA2),  I2(Ai|A2), 
D{Xi)  and  27 (A2)  such  that 

RxiX2{Di,D2)  =  <%,(£>!,  D2),  {Di,D2)  <  27(AiA2), 

RXi\X2{Di)  =  Rf^^x2^Di),  Di  <  27(Ai|A2), 

RxiiDi)  =  R^^iDi),  Di  <  2?(Ai), 

Rx2{D2)  =  2?g(272),  T>2  <  2?(A2), 


(13) 
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and 

V{Xi\X2)  <ViXi),  (14) 

'DiXiX2)  <  {V(Xi \X2),  V{X2))  <  iViXi),  V{X2)).  (15) 
Finally, 

D2)  =  +  /?g(Z)2),  (16) 

-  I{Xi- X2),  (17) 

Rf^x^Du  D2)  =  Rf^{Di)  +  Rf^{D2)  -  /(Xi;  X2).  (18) 

It  is  apparent  that  if  {D\,  D2)  <  V{XiX2),  the  ESLB  of  the 
joint,  marginal  and  conditional  rate  distortion  functions  yield 
their  corresponding  rates,  then  the  equations  in  (16)-(18)  imply 
equalities  in  (10). 

C.  Successive  Refinement 

Successive  refinement  refers  to  a  source  coding  scenario 
where  a  source  is  coded  in  multiple  stages  from  coarser 
descriptions  to  finer  descriptions  and  the  description  at  each 
stage  is  optimal  [26]. 

Consider  a  sequence  of  i.i.d  random  variables  f/"  where 
each  Ui  is  drawn  from  a  source  alphabet  U  with  distrib¬ 
ution  p(u).  Denote  the  reconstruction  alphabet  as  U.  Let 
d  U  y.lA  ^  [0,  00)  be  the  single  letter  distortion  measure 
ot\U  y~U  which  induces  the  distortion  measure  on  x  U" 
according  to 

1  " 

d(u'',u")  = d{ui,Ui).  (19) 

n 

i=\ 

The  t/"  sequence  is  first  described  by  a  message  at  rate  Ri, 
which  incurs  distortion  Dj,  then  an  addendum  is  provided  to 
the  first  message  at  rate  {R2  —  Ri)  so  that  the  message  now  has 
distortion  D2  {D\  >  D2).  If  R\  =  Ru{D\),  R2  =  Ru{D2), 
we  say  that  the  sequence  is  successively  refinable. 

Definition  2  [26]:  A  source  U  is  said  to  be  successively 
refinable  from  distortion  Di  to  distortion  D2  for  Dj  >  D2 
if  there  exists  a  sequence  of  encoding  function  fi  ;  U" 

(1,  •  •  •  ,  2"^M  and  /2  :  ^  (1,  •  •  •  ,  and  recon¬ 
struction  functions  gi  :  jl,---  ,2"^*}  — >•  and  g2  : 

{1,---,2"^M  X  (I,---  ^  U"  such  that  for 

UI  =  gi(/i(t/"))  and  C"  =  g2(/i(t/"),  /2(t/”)), 

Ri  =  RuiDi),  lim  sup  EdiU",U?)  <  Di,  (20) 
«— >-00 

R2  =  Ru(.D2),  lim  sup  Ed{U'\  U’f)  <  D2.  (21) 

n^oo 

The  source  U  is  said  to  be  successively  refinable  if  it  is 
successively  refinable  from  distortion  D\  to  distortion  D2  for 
every  Di  >  D2. 

It  is  shown  in  [26]  that  successive  refinement  from 
a  coarse  description  l]\  with  distortion  D\  to  a  finer 
description  U2  with  distortion  D2  can  be  achieved  if  and  only 
if  the  conditional  distributions  p{u\\u)  and  p{u2\u)  which 
achieve  Ru{Di)  —  I(U;  Ui),  i  —  1,2  are  such  that  U,  U2,  Ui 
form  a  Markov  chain. 

Theorem  1  [26]:  U  is  successively  refinable  from  distor¬ 
tion  Di  to  D2  {Di  >  D2)  if  and  only  if  the  optimal  random 


variables  Ui  achieving  (Di,  Ru{Di)),i  =  1,2  satisfy  the 
Markov  chain 

U-U2-U1.  (22) 

A  similar  definition  of  successive  refinement  applies  to 
vector  sources  with  individual  distortion  constraints  and  a 
similar  result  to  Theorem  1  can  also  be  obtained  for  vector 
sources  [29].  We  state  the  Markovian  characterization  of 
successive  refinement  for  a  pair  of  random  variable  ({/,  V) 
here. 

Theorem  2  [29]:  The  source  {U,  V)  is  successively  refin¬ 
able  from  {Djj,Dy)  to  {Djj,Dy)  with  {D^,Dy)  > 

{D^,  Dy)  if  and  only  if  the  optimal  random  variables  {Ui,  Vi) 
achieving  ((Djy,  Dy),  RuviD'u,  Dy))  for  /  =  1,2  satisfy  the 
Markov  chain 

{U,V)-{U2,V2)-iUi,Vi),  (23) 

where  {D]j,Dy)  >  {Dy,Dy)  is  a  shorthand  notation  for 
Djj  >  D^  and  Dy  >  Dy. 

III.  The  Lossy  Source  Coding  Interpretation 
OF  Wyner’s  Common  Information 

The  common  information  defined  in  (1)  equally  applies  to 
that  of  continuous  random  variables.  However,  such  definitions 
are  only  meaningful  when  they  are  associated  with  concrete 
operational  interpretations.  In  this  section,  we  develop  a  lossy 
source  coding  interpretation  of  Wyner’s  common  information 
using  the  Gray-Wyner  network. 

A.  Lossy  Gray-Wyner  Source  Coding 

In  Section  II-A,  Wyner’s  first  approach  to  explain  the  com¬ 
mon  information  of  discrete  random  variables  is  based  on  the 
lossless  source  coding  theorem  of  the  Gray-Wyner  network. 
Let  {Rq,  R\,  R2)  be  the  rate  triple  of  the  three  messages.  The 
set  of  triples  {Rq,  Ri,  R2)  satisfying 

Ru  +  Ri  +  Ri^  H{XuX2) 

is  referred  to  as  the  “Pangloss  plane”  as  H{Xi,X2)  is  the 
minimum  sum  rate  needed  to  recover  {X\,X2)  with  joint 
decoding,  hence  provides  a  natural  lower  bound  on  the  sum 
rate  for  the  Gray-Wyner  source  coding  problem. 

Let  TZi  be  the  set  of  all  achievable  rate  triples  {Ru,  Ri ,  Ri). 
For  the  rate  triples  that  lie  on  the  intersection  of  the  region  72^  1 
and  the  Pangloss  plane,  a  total  rate  of  H{Xi,  X2)  can  be  split 
into  three  parts  for  (Rq,  Ri,  R2)  and  the  source  Xi,X2  can 
be  recovered  losslessly  at  the  receivers.  Wyner’s  first  approach 
actually  shows  that  C(Xi,  X2)  is  the  minimum  Rq  that  lies  on 
the  intersection  of  the  achievable  region  TZi  and  the  Pangloss 
plane. 

Motivated  by  this  relationship  of  Wyner’s  common  infor¬ 
mation  and  the  lossless  rate  region  TZi,  we  explore  the 
connection  of  Wyner’s  common  information  and  the  lossy  rate 
distortion  region  of  Gray-Wyner  network,  which  provides  a 
new  interpretation  of  Wyner’s  common  information. 

The  rate  distortion  region  of  the  Gray-Wyner  network  is 
defined  as  follows.  Let  c/,  (x,  ,  x/),  i  =  1,  2  be  a  give  nonnega¬ 
tive  per-letter  distortion  function  for  Z,.  Define  A,-,  i  =  1,  2  to 
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be  the  average  distortion  between  the  i  th  component  sequence 
of  the  encoder  input  and  the  I'th  decoder  output, 

1  " 

A,-  =  E[di{X’l,  Z”)]  =  -  ^  E[di{Xik,  Xik)].  (24) 

”  k=\ 

An  {n,  Mo,  Ml,  M2)  code  with  an  average  distortion 
vector  (Ai,  A2)  is  said  to  be  an  (n.  Mo,  M\,  M2,  Ai,  A2)  rate 
distortion  code. 

For  any  {Di,  D2)  e  K+,  a  rate  triple  {Ro,  Ri,  R2)  is  said 
to  be  (£>1,  D2)-achievable  if  for  arbitrary  e  >  0,  there  exists, 
for  n  sufficiently  large,  an  («,  Mo,  Mi,  M2,  Ai,  A 2)  code  such 
that 

Mi  <  (25) 

Xi<Di+e,  1=0, 1,2.  (26) 

The  rate  distortion  region  TZ2{E>i,  D2)  is  defined  as  the  set  of 
all  (£>1,  £>2) -achievable  rate  triples  {Ro,  Ri,  R2)- 

This  definition  of  lossy  source  coding  applied  both  to 
discrete  and  continuous  random  variables.  A  characterization 
of  the  region  7?,2(£>i,  £>2)  is  given  below. 

Theorem  3  [17]:  If  the  source  (Xi,X2)  has  the  property 

that  there  exist  xi  e  Xi  and  X2  e  X2  such  that 

Edi{Xi,Xi)  <00,  I  =  1,2,  (27) 

then  'R.2{Di,  D2)  is  the  union  of  all  rate  tuples  (Ro,  Ri,  R2) 
that  satisfy 

Ro>  nXi,X2;W),  (28) 

Ri  >  Rx,\wiDi),  1  =  1,2,  (29) 

for  some  W  ~  p{iv\xi,X2). 

Similar  to  the  lossless  case,  we  can  define  a  number  Rq  to  be 
(£>1 ,  D2)-achievable  as  follows.  For  any  (Di ,  D2)  >  0,  a  num¬ 
ber  Ro  is  said  to  be  (Di ,  D2)-achievable  if  for  any  e  >  0,  there 
exists,  for  n  sufficiently  large,  an  (n,  Mq,  Mi,  M2,  Ai,  A2) 
code  with 

Mo  <  2”^o,  (30) 

^  1 

^-logM,  <  RxiX2{E>i,D2)  +  e,  (31) 

(=0  ” 

Ai  <  £>i  -f  e,  A2  <  £>2  +  (32) 

Here  (31)  is  also  referred  to  as  the  Pangloss  bound  for  the 
lossy  source  coding  problem  with  the  Gray-Wyner  network. 
C2{Di,D2)  is  defined  as  the  infimum  of  all  f?o’s  that  are 
(£>1,  D2)-achievable.  Thus,  C3(£)i,  £>2)  is  the  minimum  com¬ 
mon  message  rate  for  the  Gray-Wyner  network  with  sum 
rate  RxiXiiDi,  D2)  while  satisfying  the  distortion  constraint. 
Since  Ro  —  RxiX2{Di,  £>2)  is  always  {Di,  £)2)-achievable,  it 
is  obvious  that 

C2{Di,D2)<RxiX2{Di,D2).  (33) 

The  following  theorem  gives  a  precise  characterization 
of  C3(£>1,£>2). 

Theorem  4:  If  the  source  {Xi,X2)  has  the  property  that 
there  exist  xi  e  Xi  and  X2  e  X2  such  that 

Edi{Xi,Xi)  <00,  i  —  1,2, 


then 

C3(£>i,£>2)  =  C(Di,£)2),  (35) 

where  C{Di,  £>2)  is  the  solution  to  the  following  optimization 
problem: 

inf/(Zi,Z2;  W)  (36) 

subject  to 

Rxi\w{E>i)  +  Rx2\w{E>2)  +  I{Xi,X2;  W)  —  RxiX2{Di,D2). 

Proof:  See  Appendix  A.  ■ 

The  authors  in  [16]  gave  an  alternative  characterization  of 
C-iiDi,  £>2).  Define 

C*(Di,D2)  =  inf/(Zi,Z2;  W), 

where  the  infimum  is  taken  over  all  joint  distributions  for 
Xi,X2,X\,Xl,  W  such  that 

X^-W  -  X^,  (37) 

iXi,X2)-{XlX*2)-W,  (38) 

where  (Z^Zj)  achieves  RxiX2iDi,  02).  It  was  shown 
in  [16]  that  C^iDi,  D2)  =  C*{Di,  D2).  This,  combined  with 
Theorem  4,  establishes  that 

C(Di,D2)  =  C*(Di,D2).  (39) 

C{Di,D2)  is  derived  from  the  rate  distortion  region 
TZ2{Di,  D2)  given  in  Theorem  3  while  the  authors  in  [16] 
chose  to  derive  C*  (Di ,  D2)  from  an  alternative  characteriza¬ 
tion  of  TZ2iDi,  D2)  given  in  [22].  In  Appendix  B,  we  provide 
a  direct  proof  of  (39)  for  completeness.  Also,  as  given  in 
Appendix  B,  a  necessary  condition  for  the  equality  condition 
in  the  optimization  problem  (36)  is 

RxiX2\wiE>i,  D2)  —  l?Xi|w(Di)  -t-  Rx2\w{E>2)- 

B.  The  Relation  of  Cs(Di,  D2)  and  the  Common  Information 

Given  our  characterization  of  C3(Di,  D2)  in  Theorem  4,  we 
now  establish  its  connection  with  C(Zi,  Z2)  which  leads  to  a 
new  interpretation  of  Wyner’s  common  information.  We  begin 
with  the  following  two  lemmas. 

Lemma  3:  Let  W  be  the  random  variable  that  achieves  the 
common  information  of  Zi  and  Z2.  If 

RxiX2\w{E>i,  D2)  +  C(Zi,  Z2)  =  RxiX2{Di,  Di), 

then 

C(Di,D2)<C(Zi,Z2).  (40) 

Lemma  3  is  a  direct  consequence  of  Theorem  4  as  the 
Markov  chain  Zi  —  IT  —  Z2  implies  RxiX2\w{E>i,  D2)  = 
Rxi\w{Di)  + Rx2\w{D2)-  Thus,  the  equality  constraint  in  (36) 
is  satisfied.  Inequality  (40)  follows  as 

C{Di,  D2)  <  /(Zi,  Z2;  W)  =  C(Zi,  Z2). 

The  next  lemma  gives  a  sufficient  condition  under  which 
C{Di,  D2)  >  C(Zi,  Z2)  is  true. 
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Lemma  4:  For  any  distortion  pair  (Di,  D2)  >  0,  if  the  rate 
distortion  function  satisfies 

Rx,X2{Du  D2)  =  RxADi)  +  Rx2{Di)  -  /(Xi;  X2),  (41) 

then  we  have 

C{Di,D2)  >  C{Xi,X2). 

Proof:  See  Appendix  C.  ■ 

Theorem  4,  Lemmas  3  and  4,  together  with  the  relations 
of  marginal,  joint  and  conditional  rate  distortion  functions 
described  in  Lemmas  1  and  2,  allow  us  to  determine  a  region 
such  that  C3(Di,  D2)  equals  to  the  common  information. 

Theorem  5:  Let  {X\,X2)  be  a  pair  of  random  variable 
with  distribution  p{x\,X2)  and  alphabet  X\  x  X2,  where 
X\  and  X2  are  arbitrary  measurable  spaces  that  can  be  discrete 
or  continuous.  Let  W  be  any  random  variable  achieving 
the  common  information  of  {Xi,X2).  Let  the  reproduction 
alphabets  Xi  =  Xi,  X2  =  X2  and  two  per-letter  distortion 
measures  di{xi,x\),  d2ix2,X2)  satisfy 

di{xi,Xi)  >  di{xi,Xi)  —  0,  Xi  Xi,  /  =  1,  2.  (42) 

If  the  following  conditions  are  satisfied: 

1)  For  any  xi  e  Xi,  X2  e  X2  and  w  e  W,  p{w\x\X2)  >  0, 

2)  There  exists  an  xi  e  <^1,  X2  e  A2  such  that 

Edi{Xi,Xi)  <00,  i  =  1,2,  (43) 

then  there  exists  a  strictly  positive  surface  y  =  (71,  72)  such 
that,  for  0  <  {Di,  D2)  <  7 , 

C3(£»i,Z)2)  =  C(Xi,A2).  (44) 

Proof:  Because  of  the  first  condition  in  Theorem  5,  apply¬ 
ing  Lemma  2  to  the  random  variable  triple  {Xi,  X2,  W)  yields 
that  there  exists  a  strictly  positive  surface  'D(XiX2\W}  such 
that  for  any  0  <  (Dj,  D2)  <  V(XiX2\W),  RxiX2\w(.Du  D2), 
RxiX2iD\^  Di)  equal  their  corresponding  ESLB  which  satisfy 
that 

nXuXr,  IT)  +  Rf^x2\w^Du  D2)  =  Rf^x^iDu  D2).  (45) 

Thus  for  any  0  <  {Di,  D2)  <  'D{XiX2\W), 

/(Zi,  A2;  IT)  +  RxiX2\w(.Di,  D2)  =  RxiX2{Du  D2).  (46) 

Also  by  Lemma  2,  there  exists  a  strictly  positive  surface 
'D{XiX2)  such  that  for  any  0  <  {D\,  D2)  <  'D{XiX2), 

RxiiDi)  +  Rx2iD2)  -  /(Zi;  Z2)  =  /?YiX2(T>i,  £>2).  (47) 

Since  X>(ZiZ2|lT)  <  I?(ZiZ2),  let  7  =  X>(ZiZ2|lT),  both 
equalities  (46)  and  (47)  hold  for  0  <  iDuD2)  <  7 .  Therefore, 
the  conditions  in  both  Lemmas  3  and  4  are  satisfied,  we  have 
C{DuD2)  =  C(Zi,Z2)  for  0  <  (£>i,£>2)  <  7-  Thus  the 
proof  is  completed  by  Theorem  4.  ■ 

Remarks: 

1)  Theorem  5  shows  that  under  quite  general  conditions, 
Wyner’s  common  information  is  precisely  the  smallest 
common  message  rate  €3(01,  D2)  of  the  Gray-Wyner 
network  for  a  certain  range  of  distortion  constraints 
when  the  total  rate  is  arbitrarily  close  to  the  rate  dis¬ 
tortion  function  with  joint  decoding.  As  the  common 


information  is  only  a  function  of  the  joint  distribution, 
hence  is  a  constant  for  a  given  p{xi,X2),  it  is  surprising 
that  the  smallest  common  rate  €3(01,02)  remains 
constant  even  if  the  distortion  constraints  vary,  as  long 
as  they  are  within  a  specific  distortion  region. 

2)  While  Theorem  5  establishes  that  €3(01,02)  — 

C(Zi,Z2)  for  (0i,02)  <  7,  it  does  not  specify  the 
value  of  the  positive  distortion  surface  7  .  By  the  proof  of 
Theorem  5,  we  know  7  is  exactly  I?(ZiZ2|lT),  the  crit¬ 
ical  region  of  distortion  where  RxiX2\w(Di,  Di)  equals 
its  corresponding  ESLB  R^^^^^^y^(0\,  O2).  Further¬ 
more,  let  0‘^  =  (Dj ,  £>2)  be  the  two-dimensional  dis¬ 
tortion  surface  such  that  RxiX2(D\^  T)^)  =  C(Zi,  Z2), 
then  we  must  have 

7  <  'D‘'- 

This  is  because  if  7  >  27'^,  then  there  exists  (0\,  O2) 
such  that  7  >  (0\,02)  >  27'^  and  C3(Z7i,Z72)  < 
Rx,X2(Du  O2)  <  Rx,X2(D‘{,  OD  =  C(Zi,  Z2),  which 
contradicts  Theorem  5. 

C.  €3(0],  O2)  for  Successively  Refinable  Sources 

From  the  second  remark  of  Theorem  5,  we  know  7  <  27'^. 
Now  let  us  consider  a  particular  point  on  the  surface  27'^, 
denoted  by  (27°,  O^)  and  defined  below.  We  will  show  that 
under  the  assumption  that  such  a  point  exists  on  27'^  and  the 
sources  are  successively  refinable,  then  C3(27i,  O2)  equals  the 
common  information  for  any  (27i,  O2)  <  (£>?,  D°). 

Let  IT  be  the  auxiliary  random  variable  that  achieves 
C(Zi,Z2).  Suppose  there  exists  a  distortion  pair  (D°,  27°) 
satisfying,  for  i  =  1,2, 

Rx,(0°)  =  2(Z/;  W), 

0°  ^  inf  Edi(Xi,X°(W)),  (48) 

XiilD) 

where  x°(to),  (10)  are  deterministic  functions.  Under 

this  assumption,  we  can  show  that  RxiX2  (D\,Ol)  = 

2(Zi,  Z2;  IT),  which  means  (27°,  27°)  is  on  the  surface  27*^. 
The  joint  rate  distortion  function  2?YiX2(77°,  27°)  not  only 
equals  to  the  common  information  but  also  is  achieved  by 
the  auxiliary  random  variable  IT.  Furthermore,  it  is  easy  to 
establish 

C3(27°,27°)  =  C(Zi,Z2),  (49) 

using  Lemma  4  and  the  fact  that  C3(27°,27°)  < 
RxiX2(D\^  £*2)-  means  that  in  the  Gray-Wyner  network, 
with  the  total  rate  equal  to  2?YiX2(77°,  27°),  the  scheme  to 
transmit  the  pair  of  sources  (Z^Zj)  within  distortion  con¬ 
straints  (D°,  27°)  is  to  communicate  IT  to  the  two  receivers 
using  the  common  channel. 

Let  us  now  decrease  the  distortion  constraints  from 
(27°,  0°)  to  (0i,02)  <  (27°,  27°).  The  question  is  whether 
the  rate  C(Zi,Z2)  is  (27i,  272)— achieveble,  i.e.,  if  it  is 
possible  to  transmit  the  sources  (Z'^Z^)  with  smaller 
distortions  (0i,02)  with  the  sum  rate  at  RxiX2(Di^D2) 
while  keeping  the  common  rate  at  C(Zi,Z2).  In  the  fol¬ 
lowing  theorem,  we  identify  a  sufficient  condition  under 
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which  Cj,(D\,D2)  —  C(Xi,X2)  for  successively  refinahle 
sources.  This  sufficient  condition  ensures  the  optimality  of  a 
two-stage  encoding  scheme:  first  encode  the  common  message 
with  rate  C(Xi,X2}  and  we  can  obtain  a  coarse  distor¬ 
tion  (D°,  T)^),  then  encode  the  two  private  messages  with 
rates  Rx^iw(Di)  and  Rx2\wiD2)-  The  successive  refinement 
assumption  guarantees  that  the  two-step  approach  achieves  the 
distortion  {Di,  D2)  and  the  sum  rate  does  not  exceed  the  total 
rate  Rx^ii^i,  O2). 

Theorem  6:  Assume  the  source  (Ai,  A2)  has  the  property 
that  there  exist  xi  e  Xi  and  X2  e  X2  such  that 

Edi{Xi,  Xi)  <  00,  i  =  1,2. 

Let  W  he  the  auxiliary  variable  that  achieves  C(Xi,  X2)  and 
(Dp  D2)  be  a  distortion  pair  satisfying  (48).  If  the  source 
{X\,  X2)  is  successively  refinahle  from  (Dp  D®)  to  (Di,  D2) 
for  (Di,D2)  <  (Dp  D2),  and  A,  is  successively  refinahle 
from  D®  to  D,  for  D;  <  D®,  i  =  1,2,  then, 

C3(Di,D2)  =  C(Ai,A2). 

Proof:  See  Appendix  D.  ■ 

In  the  following  section,  we  will  consider  two  examples 
involving  successively  refinahle  sources:  the  binary  random 
variables  and  bivariate  Gaussian  variables.  For  these  two  cases, 
we  compute  explicitly  the  function  C3(Di,  D2)  and  establish 
its  connection  with  C(Ai,  A2).  The  distortion  pair  (Dp  Dj) 
satisfying  (48)  is  identified  for  both  cases,  thus  Theorem  6  can 
be  directly  applied. 

IV.  Examples 
A.  Binary  Random  Variables 

Let  S  ~  Bern(0)  for  0  <  6*  <  1,  i.e.,  S  e  {0, 1}  and 
P{S  =  1)  =  0.  Let  A,,  i  —  1,  •  •  •  ,  A,  be  the  output  of  a 
binary  symmetric  channel  (BSC)  with  crossover  probability 
a\  (0  <  til  <  j)  and  with  S  as  input.  The  BSC  channels  are 
independent  of  each  other.  Thus, 

N 

p{xu  ■■■  ,Xiv|s)  =  ]^/2(x/|s), 

;=i 

where 

1  -  fli,  if  Jc,-  =  i, 

P{xi\s)^ 

ai,  otherwise, 

for  Xi  e  {0,  1}.  Therefore,  the  joint  distribution  of 
Xi,  X2,  •  •  •  ,  Xn  is 


It  has  been  shown  by  Witsenhausen  [13]  that  the  common 
information  of  Ai,  A2  is  achieved  with  W  being  S.  That  is 

C(Ai,  A2)  =  /(Ai,  A2;  S)  =  HiXu  A2)  -  2h(ai), 

where  /?(■)  is  the  binary  entropy  function.  When  0  — 
(Ai,A2)  is  a  Doubly  Symmetric  Binary  Source  (DSBS) 
whose  common  information  was  derived  by  Wyner  [4]  using 
a  different  approach. 

The  above  result  can  be  generalized  to  the  common  infor¬ 
mation  for  N  variables  ,  each  of  which  is  the  channel  output 
of  a  BSC  with  the  common  input  S. 

Proposition  1:  Let  S  ~  Bern(0)  and  let  A/,  i  =  1,  •  •  •  ,  A, 
be  the  output  of  independent  BSCs  with  common  input  S  and 
crossover  probability  0  <  ai  <  1/2.  Then  for  any  A  >  2,  the 
common  information  for  Ai ,  •  •  •  ,Xxi  is  given  as 

C(Ai, . . .  ,  A/v)  =  /(Ai,  ■  ■  ■  ,  Aw:  S).  (52) 

Proof:  That  C(Ai,  •  ■  ■  ,  Aw)  <  /(Ai,  •  •  ■  ,  Aw;  5)  fol¬ 
lows  from  the  definition  of  the  common  information  for  mul¬ 
tiple  random  variables  [31].  The  inequality  C(Ai,  •  •  •  ,  Aw)  > 
/(Ai,  •  •  •  ,  Aw;  5)  can  be  proved  by  contradiction.  Suppose 
there  exists  a  W  such  that 

C(Ai,---  ,Aw)  =  /(Ai,---  ,Aw;  W) 

<  /(Ai,---  ,Aw;5),  (53) 

i.e.,  C(Ai,---  ,Aw)  is  achieved  by  W  and  it  is  strictly 
less  than  I(X\,---  ,Xn',S).  Since  W  induces  conditional 
independence  of  Ai,  •  •  •  ,  Aw,  we  have,  from  (53), 

N  N 

Y,H(Xi\W)  >  Y,HiXi\S). 

1=1  (=1 

Thus,  there  must  exist  two  random  variables  Xk,Xj, 
k,  j  e  -  ,  N]  such  that 

H(Xk\W)  +  H{Xi\W)  >  H{Xk\S)  +  H{Xj\S). 

Given  that  the  sequence  [Ai,  •  •  •  ,  Aw)  is  exchangeable  [30], 
p(xk,xj)  has  the  same  joint  distribution  as  p{x\,X2).  Thus, 

C(Ai,A2)  =  C{Xk,Xj) 

<  nXk,Xj-  W) 

<  I{Xk,Xj-S) 

=  I{XuX2-,S). 


p{xi,X2,---  ,Xn) 


N 


ie(0,l)  (=1 

=  ea^^  {I  -  +  (1  -6I)(1  (50) 

where  fw  =  Sjli 

For  A  =  2,  the  joint  distribution  of  Ai,  A2  is  given  by  the 
following  probability  matrix, 

6{l  -  aif  +  (1  -  9)a^  ai(l-fli)  1 

a\{\  —  a])  9al  +  {\  —  9){\  —  afp' 


This,  however,  contradicts  the  fact  that  S  achieves  C(Ai,  A2). 
Thus  the  proposition  is  proved.  ■ 

We  now  characterize  the  minimum  common  rate 
C3(Di,D2)  for  a  DSBS. 

Proposition  2:  Consider  a  DSBS  (Ai,  A2)  with  distribution 


p{xi,X2) 


5(1-00),  ifxi=X2, 
500,  otherwise. 


(54) 


where,  without  loss  of  generality,  0  <  oq  <  1/2.  Let  ai  be 
such  that  oo  =  2ai(l  —  fli),0  <  oi  <  1/2.  With  Hamming 


XU  et  ai:  LOSSY  SOURCE  CODING  INTERPRETATION  OF  WYNER’S  COMMON  INFORMATION 


761 


distortion  d\  =  d2  =  dn,  we  have 


C3(£>1,D2) 


CiXuX2),  iDuD2)e£w, 

RxiX2iDi,  Di),  (Di,  Di)  ^  £2'^  £3, 

0,  iDuD2)>{^,^), 

(55) 


CiXu  X2)  <  C3(£»i,  £>2)  <  Rx^X2iDu  D2), 
{Dx,D2)e£n,  (56) 


where 


£10  =  {(£>1,  Di)  :  0  <  A  <  ai,i  —  1,  2}, 

£11  =  £\o  n  {(£>1,  D2)  :  Di  +  D2  —  2D1D2  <  flol, 

[Di  — £>2  D2  —  D1 


£2  =  £fon£nn 


(£>1,  £>2)  :  max 


£3  = 


I-2D2’  l-2Di 
(£>i,  £>2)  ■  Di  <  -,i  —  1,2 


<ao  , 

■  (57) 


Proof:  For  Z,  ~  Bern(l/2),/  =  1,2  with  Hamming 
distortion,  the  rate  distortion  function  is 


7?x,(A)  = 


\-h{Di),  0<Di  <  i, 

0,  Di  >  i. 


The  joint  rate  distortion  function  of  the  DSBS  (Zi,Z2)  is 
given  by  [29] 

RxiX2iDi,  £>2) 

'  1  +  hiao)  -  hiDi)  -  h{D2),  (£>i,  £>2)  e  £1, 


1  -  h  (min{£)i,  £>2}) , 


^  -  aoh 


Di— D?,+an  \ 

2flo 


(Di,£)2)e£2, 

(Di,£)2)e£3. 


(58) 


where  £1  =  £ioU£ii  with  £10,  £11,  £2  and  £3  defined  in  (57). 
Therefore,  for  this  DSBS,  Rxi{Di)  +  Rx2{D2)-I{Xi\  X2)  = 
RxiX2{D{,  D2),  for  (Di,  D2)  e  £1.  From  Lemma  4,  we  have 
for  (Di,D2)  e£i, 

C3(Di,D2)>C(Zi,Z2).  (59) 


On  the  other  hand,  let  S  be  the  binary  random  variable 
that  achieves  the  common  information  of  Zi,Z2.  That  is 
S  ~  Bern(l/2)  and  p{xi\s)  =  1  —  ai  if  j  =  x,-  for  i  =  1,2. 
Then  the  conditional  rate  distortion  function  Rxi\siDi)  is 
given  by  [18] 


RXi\s{Di) 


h{ai)  —  h{Di),  Q  <  Di  <  ai, 

0,  Di  >  fli. 


Therefore,  Rxi\siDi)  +  Rx2\siD2)  +  /(Zi,Z2;£)  = 

RxiX2(D\,D2)  is  satisfied  for  (Di,D2)  e  £10.  From 
Theorem  4,  CsiDu  D2)  <  C(Zi,Z2)  for  (Di.Dj)  e  £10. 
Together  with  (59)  and  given  that  £10  C  £1,  we  have  proved 
that  for  (Di,  D2)  e  £10, 


C3(Di,D2)  =  C(Zi,Z2). 

For  {D\,D2)  e  £2,  we  only  need  to  show  that 
C^{Di,D2)  >  RxiX2{Di,  D2).  It  was  shown  in  [29]  that 


Fig.  3.  The  distortion  regions  £*3  for  the  DSBS. 

D2)  =  C{Xi,  X2)  in  the  shaded  region. 


the  backward  test  channel  that  achieves  RxiX2{Di,  D2)  is 
given  by 

Z,  =  Zi  +Zi, 

Z2  =  Z2  +  Z2, 


where  both  Zi,Z2  and  Z\,Z2  are  binary  vectors  indepen¬ 
dent  of  each  other  with  the  probability  mass  functions  given 
respectively  as 


p.  . 

XiX2 


-  \ 
2 


0 


0 

1  > 

2  - 


1 

PZiZ2  =  2 


2  —  OQ  —  Di  —  D2 

Di  —  D2  +  ao 


D2  —  Di  +  ao 
Di  +  D2  —  ao 


Therefore,  (Zi,Z2)  that  achieves  RxiX2(Di,  D2)  satisfies 


Z2  =  Zi. 


For  the  characterization  C*{Di,  D2)  of  C2,{D\,  D2),  any  W 
satisfying  the  Markov  chain  Zi  —  IT  —  Zi  must  satisfy 
H{X\\W)  =  0.  Thus,  Zi  is  a  function  of  W  and  we  have 

/(Zi,Z2;  IT)  =  /(Zi,Z2;  lT,Zi) 

>  /(Zi,Z2;Zi) 

=  RxiX2{Di,D2). 

Therefore,  €3(^1,  D2)  =  RxiX2{Di,  A). 

The  region  £3  is  a  degenerated  one.  For  example, 
RxiX2{Di,D2)  =  Rx^Di)  if  ao  <  and  A  < 

i  =  1,2.  This  implies  that  the  optimal  coding  scheme  is 
to  ignore  Z2  and  optimally  compress  Zi.  Then  Z2  can  be 
estimated  from  Zj  with  distortion  less  than  D2.  The  case  of 
ao  <  ^_!.2D2  dealt  with  similarly.  Hence,  similar  to  the 

region  £2,  C3(Di,  A)  =  RxiXiiDi,  A).  ■ 

The  characterization  of  C3(Di,A)  is  plotted  in  Fig.  3 
as  a  function  of  the  distortion  constraints.  C2{Di,D2)  = 
C(Zi,  Z2)  in  the  shaded  region.  For  the  symmetric  distortion 
constraint,  Di  —  D2  —  D,  the  relation  of  C^{D,  D)  and  D 
for  the  DSBS  is  given  in  Fig.  4. 
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Remarks: 

.  The  claim  C^iDi,  Dj)  =  C(Zi,  X2)  for  (£>i,  £>2)  e  £10 
can  also  be  proved  using  Theorem  6.  is 

achieved  by  the  backward  test  channel  pb{x\,X2\s)  — 
p{x\\s)p{x2\s).  The  vector  source  {Xi,X2)  is  succes¬ 
sively  refinable  for  any  (£>1,  D2)  <  (ai,  fli)  [29]  and  the 
scalar  source  Z,  is  successively  refinable  for  any  D,  <  ai, 
i  =  1,2  [26],  Thus  by  Theorem  6,  C2{D\,D2)  = 
C(Zi,Z2)  for  (£>i,£)2)  < 

•  We  have  the  full  characterization  of  C2,{D\,D2) 
in  the  distortion  region  except  the  region  £11. 
From  the  proof  of  Proposition  2,  we  know  that 
C3(£'i,£'2)  >  C(Zi,Z2)  for  (£>1,02)  e  £11,  but 
the  exact  value  of  C2{D\,D2)  in  this  region  remains 
unknown. 

•  Let  {D\,D2)  <  (£>^,£>2)  <  then  the 

rate  RxiX2iD[,  D2)  is  (Di,  £>2)— achievable  in  the 
Gray-Wyner  network,  i.e.,  RxiX2iD\,  D'2)  > 

C3(£>1,£>2). 

To  show  this,  let  (Zi,Z2)  achieve  RxiX2iD[,  D!^).  The 
backward  test  channel  that  achieves  RxiX2iD\,  D'j)  sat¬ 
isfies  pb{xi,X2\xiX2)  —  Pb{xi\x\)pb{x2\x2)  where 


Pb{Xi\Xi) 


1  —  D'-,  if  Xi  =  Xi , 
£)1,  Otherwise. 


for  i  —  1,2.  Then  for  {D\,D2)  < 

let  the  rate  allocation  of  Rq,  Ri,  R2  in  the  Gray-Wyner 

network  be 


Rq  —  RxiX2iDi,  D'2) 

=  \+h{ao)-h{,D[)-h{D'2), 

—  ^Xi\XiX2^^‘^  ~  ^Xi\Xi^^‘^ 

^h{Dl)-h(Di),i^l,2.  (60) 

Since  Rq,  Ri  and  R2  in  (60)  sum  up  to  RxiX2(Di>  Di), 
RxiX2{D\,  D'2)  is  (£>i,  £>2)— achievable. 

The  minimal  Rq  satisfying  (60)  is  exactly  C(Zi,Z2), 
which  is  achieved  by  letting  (Dj,  D'j)  —  (ni,  fli). 


B.  Gaussian  Random  Variables 

In  this  section  we  consider  bivariate  Gaussian  random 
variables  Zi,  Z2  with  zero  mean  and  covariance  matrix 

,2 


^2  = 


'1 


pa\a2 

4 


(61) 


pa\a2 

The  common  information  between  this  pair  of  Gaussian 
random  variables  is  given  in  the  following  theorem. 

Theorem  7:  For  two  joint  Gaussian  random  variables 
Zi,Z2  with  covariance  matrix  K2,  the  common 
information  is 


1  \+  p 

C(Zi,Z2)=  -  log- - f-. 

2  1  —  P 


(62) 


Proof:  See  Appendix  E.  ■ 

The  above  result  generalizes  to  multi-variate  Gaussian  ran¬ 
dom  variables  satisfying  a  certain  covariance  matrix  structure, 
the  proof  of  which  can  be  constructed  in  a  similar  fashion. 

Theorem  8:  For  N  joint  Gaussian  random  variables 
Xi,  Z2,  •  •  •  ,  Xx  with  covariance  matrix  Kx, 

1  p  ■■■  f 

P  1  ■  ■  ■  / 


Kn  = 


P  P 


1 


the  common  information  is 


C(Zi,Z2,' 


,  Xn)  =  -  log 


(-1=7) 


(63) 


(64) 


We  now  characterize  the  minimum  common  rate 
C3(£)i,  £>2)  in  the  Gray-Wyner  lossy  source  coding  network 
for  bivariate  Gaussian  random  variables  with  covariance 
matrix  K2  in  equation  (61).  It  was  shown  in  [16]  that  for 
symmetric  distortion,  i.e.,  D\  —  D2  —  D, 

C(Zi,Z2),  0<D<l-p, 

C3(£>,£»)=  £xiX2  (£>,£>),  1 -/?<£><!,  (65) 

0,  D>1. 

We  characterize  C2,{D\,  D2)  for  general  distortion  {Di,D2) 
in  the  following  proposition. 

Proposition  3:  For  bivariate  Gaussian  random  variables 
Zi,Z2  with  zero  mean,  covariance  matrix  K2  and  squared 
error  distortion,  we  have  that 

'C(Zi,Z2),  {DuD2)eVw, 

C3(£»i,£>2)=  Rx,X2{Di,D2),  (Di,£)2)eI?2UI?3, 

0,  (£>i,£'2)>  (1,1), 

(66) 

C(Zi,  Z2)  <  C3(£»i,  £»2)  <  Rx,X2{Di,  £>2), 
{DuD2)eVn,  (67) 

where 

Vw  =  ((£»1,  £>2)  ;  0  <  A  <  1  -  P,  i  =  1,  2], 

G  [(£>1,  £>2)  :  £>i  +  £>2  —  £>i£l2  <  1  —  P^l, 

1-Di  l-£)2l 


V2  = 


(£>1,  D2)  :  min 


I-D2’  l-£)i 


A3  =  27m  n  'Dn  G  27^  n  ((£»i,  A2)  :  A  <1,1  =  1,2}. 


(68) 
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Proof:  The  joint  rate  distortion  function  for  Gaussian 
random  variables  with  squared  error  distortion  [27]-[29]  is 
given  by 


RXiX2  {Di,D2) 
7  log 


±l£l 


D1D2- (/)-V(i-Oi)(i-D2)) 
2  min(Di,D2)  ’ 


(Dl,  D2)  e  Vi, 
{D\,  Dj)  e  I?2, 

{D\,  Di)  e  I?3, 

(69) 


where  Vi  —  Viq  UVu.  The  marginal  rate  distortion  function 
for  Xi  ~  ffiO,  1),  i  =  1,  2,  is 


Rx.iDi) 


^log-^,  0  <  A  <  1, 
0,  A  >  1. 


Therefore,  Rx^Di)  +  Rx2iD2)  -  /(Xi;Z2)  = 

RxiX2{Di,  for  (Di,D2)  e  Ai.  From  Lemma  4, 
for  (Ai,A2)  e  Al, 


C3(Ai,  A2)  >  C(Zi,Z2). 


On  the  other  hand,  the  random  variable  W  in  the  following 
decomposition  of  Xi  and  X2  achieves  the  common 
information 

Xi  =  ^W  +  ^1-pNi ,  i  =  1 ,  2.  (70) 


where  fF,  Ni ,  N2  are  mutually  independent  standard  Gaussian 
random  variables.  The  conditional  distribution  of  X  given  W  is 
Gaussian  distribution  with  variance  I—  p.  Hence,  for  i  =  1,2, 
the  conditional  rate  distortion  function  is 


RXi\w{Di) 


jlog^,  0  <  A;  <  1  -  p, 
0,  '  A,>l-p. 


(71) 


The  condition  /?Yi|w(Ai)  +  Rx2\w(D2)  +  I{Xi,X2;  W)  = 
RxiX2iD\^D2)  is  satisfied  for  {D\,D2)  e  Aiq.  From 
Theorem  4,  C3(Ai,  A2)  <  C{Xi,X2)  for  (Ai,A2)  e  Aiq. 
Since,  Aio  c  Ai,  we  proved  that  for  (Ai,  A2)  e  Am, 


C3(Ai,A2)  =  C(Xi,X2). 


For  (Al,  A2)  e  A2,  it  was  shown  in  [29]  that  (Xi,  X2)  that 
achieves  RxiX2{D\^  D2)  satisfies 


Z2  = 


1  -  A2 
1- A, 


X, 


Hence,  using  the  characterization  C*(Ai,A2),  it  is  easy  to 
show  that  the  W  satisfying  the  Markov  chains  (37)  and  (38) 
must  satisfy  two  Markov  chains 

X\X2-Xi  -W  -X2, 

X\X2-X2-W  -Xi. 


Therefore,  we  have 

I{Xi,X2\  W)  =  I{Xi,X2\Xi)^I{Xi,X2\Xi,X2), 

which  proves  C3(Ai,  A2)  =  RxiX2iD\,  R>2)- 

The  region  A3  is  a  degenerated  one.  For  example, 
RxiX2{DuD2)  =  Rxi{Di)  if  this  means 


Fig.  5.  The  distortion  regions  'D\\,  7)2  and  7)^  for  bivariate  Gaussian 
random  variables.  C3(Z)i,  D2)  =  C{X[,  X2)  in  the  shaded  region. 


that  the  correlation  between  Xi  and  X2  is  so  strong  that 
the  optimal  coding  scheme  is  to  encode  X\  to  within 
distortion  Ai  and  ignore  Z2.  Then  X2  can  be  estimated 
from  Xi.  We  have 


X2  =  pXi. 


The  case  of  <  p^  is  dealt  with  similarly.  Hence,  we 

have  C3(Ai,  A2)  =  RxiX2iDi,  ^2)-  ■ 

The  characterization  of  D2)  is  plotted  in  Fig.  5 

as  a  function  of  the  distortion  constraints.  C3(Ai,A2)  = 
C(Zi,  X2)  in  the  shaded  region. 

Remarks: 

•  Similar  to  the  binary  case,  the  claim  €3(01,02)  = 

C(Xi,X2)  for  (Ai,A2)  e  Am  can  also  be  proved 
using  Theorem  6.  This  is  because  for  the  bivariate 
Gaussian  random  variables  with  covariance  matrix  K2, 
RXiX2i^  ~  ~  P)  is  achieved  by  the  backward  test 

channel  pb(xi,X2\w)  —  p(xi\w)p(x2\w),  {Xi,X2)  is 
successively  refinable  for  any  (Ai,A2)  <  (I  —  p, 
1—p)  [29]  and  X,  is  successively  refinable  for  A,  <  1—p, 
i  =  1,2  [26]. 

.  Similarly,  C3(Ai,  O2)  >  C(Xi,X2)  for  (Ai,  O2)  e  An 
but  the  exact  characterization  of  €3(01,02)  remains 
unknown  in  this  region. 

•  Let  (Al,  O2)  <  (A'j,  Of)  <  (I  —  p,\  —  p),  then  the 
rate  RxiX2(0[,  Of)  is  (Ai,  A2)— achievable  in  the  Gray- 
Wyner  network,  i.e.,  RxiX2(0[,  Of)  >  C3(Ai,  A2). 

This  is  because  for  (A^,  Of)  e  £m>  the  joint  rate  dis¬ 
tortion  function  RxiX2(0[,  Of)  is  achieved  by  Gaussian 
distributed  (Xi,  X2)  satisfying  Xi— Xi  —  X2  —  2f2  where 
the  covariance  matrix  of  (Xi,  X2)  is  [29] 


Ky  y 
X1X2 


l-0[  p 
P  1-£»2 


Then  for  (Ai,  O2)  <  (Of  Of)  <  (I  —  p,l  —  p),  let  the 
rate  allocation  of  Rq,  Ri,  R2  for  the  Gray-Wyner  network 
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be  as  follows: 

Ro  =  RxiX2iD[,D2)  =  ^log  , 

—  ^Xi\XiX2^^‘"^  ~  ^Xi\Xi^^'">  ~  2  ~ 

(72) 

Rq,  Ri  and  R2  in  (72)  sum  up  to  RxiX2(Di>  D2),  so 
RxiX2(D\,  D2">  (£>i,  £>2)-achievable. 

Therefore,  in  the  Gray-Wyner  network,  we  can  use 
the  rate  allocation  in  (72)  to  achieve  the  distortion 
(£>i,£>2)  <  (1-p,  l-p)forany  (Z)i,D2)  <  {D[,  D'^)  < 
(I  —  p,  I  —  p).  The  minimal  Rq  satisfying  (72)  is  exactly 
C(Zi,2£'2),  which  is  achieved  by  letting  {D[,  D2)  = 

(1  -p,  1  -  p). 

V.  Conclusion 

We  have  generalized  the  dehnition  of  Wyner’s  common 
information  and  expanded  its  practical  signihcance  by  provid¬ 
ing  a  new  operational  interpretation.  We  have  derived  a  lossy 
source  coding  interpretation  of  Wyner’s  common  information 
using  the  Gray-Wyner  network.  In  particular,  it  is  established 
that  Wyner’s  common  information  is  precisely  the  smallest 
common  message  rate  when  the  total  rate  is  arbitrarily  close 
to  the  rate  distortion  function  with  joint  decoding.  A  surprising 
observation  is  that  such  equality  holds  independent  of  the 
values  of  distortion  constraints  as  long  as  the  distortions  are 
within  some  distortion  region.  Two  examples,  the  doubly  sym¬ 
metric  binary  source  under  Hamming  distortion  and  bivariate 
Gaussian  source  under  squared-error  distortion,  are  used  to 
illustrate  the  lossy  source  coding  interpretation  of  Wyner’s 
common  information.  The  common  information  for  bivariate 
Gaussian  source  and  its  extension  to  the  multi-variate  case 
have  also  been  computed  explicitly. 

While  the  lossy  source  coding  interpretation  of  Wyner’s 
common  information  presented  in  this  paper  is  limited  to 
N  —  2  random  variables,  the  results  can  be  extended  to 
arbitrary  N  random  variables  in  a  straightforward  manner. 

Appendix  A 
Proof  of  Theorem  4 

We  first  show  that  C2,{D\,  D2)  >  C{D\,  02).  Let  Rq  be 
{D\ ,  Z)2)-achievable,  then  for  any  e  >  0,  there  exists  an 
(n.  Mo,  Ml,  M2,  Ai,  A2)  code  such  that 

Mo  <  2”^o,  (73) 

^  ^  1 

^-logM/  <  Rx-iX2iD\,  Di)  +  £,  (74) 

(=0  ” 

A]  <  Di  +  e,  A2  <  T>2  +  c-  (75) 

Let  R'l  =  ^logM/,  for  i  =  0,1,2,  then  we  know  that 

(Rq,  R[,  R'2)  is  (Di,  D2)-achievable.  From  Theorem  3,  there 
exists  a  VF  such  that 

(76) 

(77) 


Therefore,  for  any  e  >  0,  we  have 
RxiX2(D\,  Di)  +  c 


2 


i=0 

(78) 

2 

l(Xi,X2;W)  +  Y,Rx,\wiDi), 

(79) 

i=l 

KXuXr,  W)  +  RxiX2\w(Di,D2), 

(80) 

RX1X2  (Di,D2), 

(81) 

where  (78)  is  from  the  inequalities  (74)  and  the  definitions 
of  R'.,i  =  0,  1,2,  (79)  is  from  (76)  and  (77),  (80)  is  from 
(11b)  and  (81)  comes  from  (10b). 

Let  e  ^  0,  then  the  left-hand  side  (LHS)  and  right-hand 
side  (RHS)  of  the  above  inequalities  become  the  same,  all  the 
inequalities  must  be  equalities.  Thus,  we  have 

7(A'i,  X2;  W)  +  Rxi|iv(L*i)  -t-  Rx2\wiD2) 

=  RxiX2(Di,D2).  (82) 

Hence,  if  Rq  is  (Di,  D2)-achievable,  there  exists  a  W  such 
that  Rq  >  I(Xi,X2',W)  and  (82)  is  true.  It  shows  that 
C3(Di,D2)>C(Di,D2). 

Next  we  show  Ct,(Di,D2)  <  C(Di,D2).  Let  W’  be 
any  random  variable  satisfying  the  equality  condition  in  the 
optimization  problem  (36).  For  any  Rq  >  I(Xi,  X2',  W)  and 
e  >  0,  let 

€i^mm\^,RQ-I(Xi,X2-,W')\,  (83) 

and  hence  ei  >  0. 

From  Theorem  3,  since  the  rate  triple 

(I(Xi,X2',W'),  Rxi\w’(Di),  Rx2\w'(D2))  is  (£>1,7)2)- 

achievable,  there  exists  an  («,  Mq,  Mi,  M2,  Ai,  A2)  code 
such  that 

-\ogMQ<l(Xi,X2-,W')  +  ei<RQ,  (84) 

n 

-log  Mi  <  Rxi\w’(Di)  +  ei,  i  —  \,2.  (85) 

n 

Sum  over  (84)  and  (85),  we  get 
2  j  2 

log  Ml  <  I{Xi,X2-,  W')  +  Y,RxAW'(Di)  +  2,ei 
1=0  ”  j=i 

—  RxiX2(Di,D2)  +  2ie\,  (86) 

<  RxiX2(Di,D2)  +  e,  (87) 

where  (86)  follows  from  the  fact  that  W'  satisfies  the  equality 
condition  in  the  optimization  problem  (36)  and  inequality  (87) 
is  from  (83). 

This  proves  that  Rq  is  (D\,  Z)2)-achievable,  thus  completes 
the  proof  of  Ct,(Di,  £>2)  <  C(Di,  D2). 

Appendix  B 

Direct  Proof  of  C(Di,  D2)  =  C*(Di,  D2) 

First  we  show  that  C(D\,  D2)  >  C*(Di,  D2). 


Rq  >  I(Xi,X2;  W), 

R'i  >  RXi\w(Di),  i  =  1,  2. 
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Let  W  be  any  random  variable  satisfying  the  equality 
condition  in  the  optimization  problem  (36)  and  let  Xi,  X 2  be 
random  variables  that  achieve  and  Rx2\wiD2),  i-S-, 

RxiX2iDi,  Di)  =  'W)  +  Rxi\w{D\)  +  Rx2\w{D2), 


(88) 

f?Yi|w(£>i)  = /(^il^ilW'),  (89) 

Rx2\w(.D2)  ^  I(,X2\X2\W),  (90) 

E[di{Xx,Xx)]  <  Di,  (91) 

E[d2{X2,X2)]  <  D2.  (92) 


Without  loss  of  generality,  we  can  assume  that  the  joint 
distribution  of  {X\,  X2,  Xi,  X2,  W)  factors  as 

p(xi,X2,Xl,X2,  w)  =  p{x\,X2,  w)p{x\\xi,  w)p{X2\X2,  w), 

because  the  distortion  D\  is  independent  of  X2  and  D2  is 
independent  of  Zi.  To  establish 

RxiX2\w{Di,D2)  —  +  -^X2|w(-D2),  (93) 

we  combine  (88)  and  the  inequalities  below 

RxiX2\w{D\,  D2)  +  f(2fi,  Z2;  W)  >  RxiX2(Di,  D2), 
RxiX2\w(Di,  D2)  <  -Rxiiw(-Di)  +  -^X2|w(-D2), 
from  Lemma  1. 

Therefore,  together  with  (88)-(92),  we  have 

RxiX2\wiDi,  D2) 

>  /(Zi;Zi|W)  +  /(Z2;Z2|W) 

=  //(Z,|W)  +  HiX2\W)  -  H{Xi\Xi,  W)  -  H{X2\X2,  W) 

>  HiXu  X2\W)  -  //(ZilZi,  W)  -  HiX2\X2,  W) 

=  H{XuX2\W)  -  //(ill  W,  Zi,  Z2)  -  //(Z2I W,  Zi,  Z2) 
=  /(Zi,Z2;Zi,Z2|W) 

>  RxiX2\w{Di,  D2). 

As  the  LHS  and  RHS  of  the  above  inequalities  are  the  same, 
all  the  inequalities  must  be  equalities  so  we  have 

/(Zi;Z2|W)  =  0. 

Furthermore  we  have 

RxiX2{D\,  D2) 

=  /(Zi,Z2;  W)  +  /(Zi;Zi|W)  +  /(Z2;Z2|W) 

=  /  (Zi ,  Z2 ;  W,  Zi ,  Z2)  -  /  (Zi ,  Z2;  Zi ,  Z2 1 W) 

+  /(Zi;Zi|W)  +  /(Z2;Z2|W) 

=  /(Zi,Z2;Zi,Z2)  +  /(Zi,Z2;  W|Zi,Z2) 

>  /(Zi,Z2;Zi,Z2) 

>  RxiX2iDi,  D2)- 

The  LHS  and  RHS  of  the  above  inequalities  are  the  same,  all 
the  inequalities  must  be  equalities  so  we  have 

/(Zi,Z2;  W|Zi,Z2)  =  0, 

/(Zi,Z2;Zi,Z2)  =  RxiX2iDuD2). 

Therefore,  Zi,Z2,Zi,Z2,W  satisfy  the  Markov  chains  in 
(37)  and  (38)  and  Zi,Z2  achieve  RxiX2iDi,  ^2)-  Thus, 
C{DuD2)  >  C*{DuD2). 


Next  we  show  that  C{Di,  D2)  <  C*{Di,  D2). 

Let  Zi,  Z2,  Z*,  Z2,  W  achieve  C*{Di,D2).  Therefore, 
they  satisfy  the  Markov  chains  in  (37)  and  (38)  and 
/(Zi,Z2;Z*,Z|)  =  RxiX2{DuD2)  and  E[di{Xi,  Xf)]  < 
DuE[d2iX2,X*)]  <  D2. 

RxiX2iDi,  D2) 

=  IiXuX2-,X*,X*) 

= /(Zi,Z2;  W,Zt,Z*)  (94) 

=  /(Zi,Z2;  W)  +  /(Zi,Z2;Zt,Z^|W) 

= /(Zi,Z2;  W)  +  //(Zt|W)  +  //(Z^|W)  (95) 

-H{X*,X*\XuX2,W)  (96) 

=  /(Zi,Z2;  W)  +  /(Zi;Zt|W)  +  /(Z2;Z^|W) 

+  H{X*i\Xu  W)  +  HiX^\X2,  W)-HiXl  Z||Zi,  Z2,  W) 

>  /(Zi,Z2;  W)  +  /(Zi;Zt|W)  +  /(Z2;Z*|W) 

+  H{X*\XuX2,  W)  +  //(Z*|Zi,  Z2,  W) 
-H{XlX^\XuX2,W)  (97) 

=  /(Zi,Z2;  W)  +  /(Zi;Zt|W)  +  /(Z2;Z2*|W) 

+  /(Z*;Z||Zi,Z2,W) 

>  /(Zi,Z2;  W)  + IiXi-,X*\W)  + IiX2;X*\W) 

>  I(X\,  Z2;  W)  +  /?Xi|lv(//l)  +  Rx2\w{D2) 

>  I  {Xu  Z2;  W)  +  RxiX2\w{Du  D2)  (98) 

>  Rx,X2{DuD2),  (99) 

where  (94)  is  from  the  Markov  chain  (Zi,  Z2)  — (Z*,  Z^)  — VT, 
(96)  is  from  the  Markov  chain  Z*  —  VL  —  Z^,  (97)  is 
because  conditioning  reduces  entropy,  (98)  and  (99)  are  by  the 
properties  of  rate  distortion  functions.  As  the  LHS  and  RHS 
of  the  above  inequalities  are  the  same,  all  the  inequalities  must 
be  equalities  so  we  have 

/(2fi,  2^2;  W)  +/?Yi|w(//i)  +Rx2\w{D2)  =  RxiX2{Di,  R>2)- 
Therefore,  C*{Du  D2)  =  /(Zi,  Z2;  W)  >  C{Du  D2). 

Appendix  C 
Proof  of  Lemma  4 

Let  W  be  any  random  variable  satisfying  the  equality 
condition  in  the  optimization  problem  (36),  that  is 

/^Xi|w(/2i)  +Rx2\w{D2)  +/(2fi,  2^2;  W)  —  RxiX2{Di,  D2)- 

(100) 

Combined  with  (41),  we  have  that 

Rx^{Dx)  +  Rx2{D2)  -  I{Xx  -  X2) 

=  Rxi\w{Di)  +  Rx2\w{D2)  +  /(2fi,  Z2;  W)  (101) 

>  RxiiDi)  -  I{Xu  W)  +  Rx2{D2)  -  /(2f2;  W) 

+  l{XuX2\W)  (102) 

=  RxiiDi)  +  Rx2{D2)  -  /(Zi;  Z2)  +  /(Zj;  Z2IW) 

(103) 

>  RxiiDi)  +  Rx2{D2)  -  /(2fi;  Z2),  (104) 

where  equation  (101)  is  from  equations  (100)  and  (41), 
inequality  (102)  comes  from  Lemma  1,  (103)  is  by  the  chain 
rule  and  inequality  (104)  is  by  the  fact  that  I  {Xu  Z2IIT)  >  0. 
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Because  the  LHS  of  (101)  is  the  same  as  the  RHS 
of  (104),  we  can  conclude  that  all  the  inequalities  above  should 
he  equalities.  This  implies  I{X\  \  X2\W)  —  0.  Therefore, 
C{D\,  D2)  >  C(Xi,  X2)  completes  the  proof. 

Appendix  D 
Proof  of  Theorem  6 

From  Theorem  4,  we  only  need  to  prove  C{D\,D2)  — 
C{Xi,X2). 

First  we  show  that  for  any  {Di,  D2)  <  Dj), 
RxiX2\w(Di,  D2)  +  I{Xi,X2\  W)  =  Rx^x^iDu  D2).  (105) 
We  have  the  following  inequality 

Rx,X2{d\,  dI)  >  Rxi{d\)  +  Rx2(T>2)  -  /(^i;  ^2)  (106) 

^  I{Xx,X2\W),  (107) 

where  (106)  is  from  (10c)  and  the  equality  (107)  is  from  the 
definition  of  (D°,  D^)  in  (48),  the  Markov  chain  Xi  —  W  —  X2, 
and  the  chain  rule. 

On  the  other  hand, 

RxiX2(T>?,  O2)  <  nXi,X2\  X?,  xO)  <  I{Xi,  X2\  W), 

(108) 

where  the  first  inequality  is  from  the  definition  of  rate 
distortion  function  and  the  second  inequality  is  from  the 
Markov  chain  {Xi,  X2)  —  W  —  {X^,  X^)  and  the  chain  rule. 
Combining  (107)  and  (108),  we  have  RxiX2(^?>  £*2)  ~ 
/(Xi,X2;X0,x0)  =  /(Xi,X2;  W). 

Let  (Xi,X2)  be  the  random  variables  achieving 
RxiX2(R^i>  Di)-  As  the  vector  source  (Zi,  Z2)  is  successively 
refinable  under  individual  distortion  constraints,  by  Theorem  2, 
we  have  the  Markov  chain  (Zi,  Z2)  —  (Zi,  Z2)  —  (Z®,  Z®). 
Therefore, 

RxiX2(.Di,D2)- I{Xi,X2\W) 

=  /(Zi,  Z2;  Zi,  Z2)  -  /(Zi,  Z2;  Z°,  Z°) 

=  /(Zi,Z2;Zi,Z2|Z?,Z°) 

>  RxiX2\'w{Di,  D2), 

where  the  last  inequality  is  from  the  Markov  chain  (Zi,  Z2)  — 
W  —  (Zj,  Xj).  On  the  other  hand,  by  Lemma  1,  we  have 

RxiX2\w{Di,  D2)  +  I (Zi,  Z2;  W)  >  RxiX2{Di,  02). 

This  establishes  (105).  Thus,  from  Lemma  3,  C{D\,D2)  < 
C(Zi,Z2). 

To  complete  the  proof,  we  need  to  show 

Rx,{Dx)  +  Rx2{D2)  -  /(Zi;  Z2)  =  Rx,X2{Du  D2),  (109) 

which  yields  C{D\,  D2)  >  C(Zi,  Z2)  in  view  of  Lemma  4. 
From  Lemma  1, 

RxADi)  +  Rx2iD2)  -  /(Zi;  Z2)  <  Rx,X2iDuD2). 

Therefore,  we  only  need  to  establish  the  other  direction.  For 
i  =  1,2,  let  Xi  be  the  random  variable  achieving  Rx,  (A), 
then  by  the  Markov  property  of  successively  refinable  scalar 


source  given  in  Theorem  1,  we  have  the  Markov  chain 
Xi  —  Xi  —  Z?  for  Di  <  .  Therefore, 

RXiiDi)  -  I{Xi-  W)  =  /(Z;;  Z,)  -  /(Z,;  Z°) 

=  /(Z,;Z,|Z°) 

>  Rxi\w{Di).,  (110) 

where  (110)  is  from  the  Markov  chain  Xi  —  W  —  X^. 

Using  (110),  we  have 

RxiiDi)  +  Rx2{D2)  -  nXv,  X2) 

>  -^Xiiw(-Di)  +  /(Zi;  W)  +  Rx2\w{D\) 

+  /(Z2;  W)-/(Zi;Z2) 

=  -^Xiiw(-Di)  +  Rx2|w(T>2)  +  I{Xi,X2\  W) 

^  RxiX2\w{DuD2)  +  I{XuX2\W)  (111) 

^  RxiX2{DuD2),  (112) 

where  (111)  is  because  Zi  —  VF  —  Z2  and  the  equation  (1  lb), 
(112)  is  from  the  equation  (105).  This  completes  the  proof. 

Appendix  E 
Proof  of  Theorem  7 

First,  we  will  show  that  the  common  information  of  Zi,  Z2 
is  only  a  function  of  the  correlation  coefficient  p.  To  show 
this,  let  Xi  =  ^Xi,  i  —  1,  2,  thus  Zi,  Z2  are  jointly  Gaussian 
distributed  with  zero  mean  and  covariance  matrix 


We  have  the  Markov  chain  that  Zi— Zj— Z2  —  Z2  and 
by  the  data  processing  inequality  for  Wyner’s  common  infor¬ 
mation  [13],  C(Zi,Z2)  <  C(Zi,Z2).  On  the  other  hand, 
we  have  the  Markov  chain  that  Zi  —  Zi  —  Z2  —  Z2  and 
C(Zi,Z2)  >  C(Zi,Z2).  Thus,  C(Zi,Z2)  =  C(Zi,Z2). 
Without  loss  of  generality,  we  will  consider  af  —  —  1, 

i.e.,the  covariance  matrix  is  in  the  form  (113)  instead 
of  (61). 

Let 

Xi  =  +  2/1-  pNi,  1  =  1,2,  (114) 

where  VF,  Ni ,  N2  are  mutually  independent  standard  Gaussian 
random  variables.  It  is  clear  that  Zi ,  Z2  are  bivariate  Gaussian 
with  correlation  coefficient  p, 

C(Zi,  Z2)  <  /(Zi,  Z2;  W)  =  i  log 

2  1  —  P 

Next  we  will  show  that 

1  \+  p 

C(Zi,Z2)>  -  log-— L. 

2  1  —  P 

For  any  U  that  satisfies  the  Markov  chain  Zi  —  U  —  Z2,  let 
D\  be  the  minimum  mean  square  error  (MMSE)  of  estimating 
Zi  using  U,  thus,  D\  —  E{X\  —  E{X\\U))^ .  Similarly,  let 
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D2  —  E{X2  —  E{X2\U))^ .  We  now  show  that  /(Xi,  X2;  U)  > 
I{XuX2\U) 

=  H(Xx,X2)  -  H{Xi\U)  -  H{X2\U) 

^  I(Xi-U)  +  I{X2\U)-I{Xi-X2)  (115) 

>  /(Zi:  E{Xi\u))  +  /(Z2;  e{X2\U))  -  /(Zi:  Z2) 

(116) 

>Rx,{Di)+Rx^{D2)-I{Xx-X2)  (117) 

1  1  — 

=  T  r.  r.  ’  ^2  <  1, 

/  L»iL>2 

where  (115)  is  from  the  chain  rule,  (116)  is  from  the  Markov 
chains  Zi  -  f/  -  E{Xi\U),  X2-U  -  E{X2\U)  and  (117)  is 
hy  the  definition  of  rate  distortion  function. 

Next  we  show  that  Di  +  D2  <  2(1  —  p),  Di  <  I,  D2  <  1. 


In  addition,  we  have  Di  —  E[Xi  - 
E[E{Xi\Uf]  <  EX\  =  1.  Thus, 

/(Zi,Z2;f/)  >  ilog 

>^log 


E{Xi\U)f  =  EX\  - 


l-p2 


D1D2 

l-p2 


> 


log 

log 


(1  -p)^ 
1  +  p 
1  -p' 
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2(1 -p) 

=  £(Zi  -  X2f 

=  £[Zi  -  E{Xi\U)  +  E{Xi\U)  -  X2f 
=  £[Zi  -  E(XilU)f  +  E[E(XilU)-X2f 
+  2E[(Xi  -  E(XilU))(E(XilU)  -  Z2)] 

=  £[Zi-£(Zi|f/)]2  +  £[£(Zi|C/)-Z2]2  (118) 

=  £[Zi  -  E(XilU)f  +  E[E(XilU)  -  E(X2lU) 

+  E(X2lU)-X2f 

=  £[Zi  -  E(XilU)f  +  E[X2  -  E(X2lU)f  +  E[E(X2lU) 
-E(XilU)f  +  E[(X2-E(X2lU))(E(X2lU)-E(XilU))] 
=  £[Zi  -  E(XilU)f  +  E[X2  -  E(X2lU)f 

+  E[E(X2lU)-E(XilU)f  (119) 

>  Di  +  D2 

where  (118)  is  from 

£[(Zi  -  £(Zi|U))(£(Zi|C/)  -  Z2)] 

=  E[(Xi  -  E(XilU))E(XilU)]  -  E[(Xi  -  £(Zi|U))Z2] 
=  -£[(Zi-£(Zi|C/))Z2] 

=  -Eux2[X2Exiiu[Xi  -  £(Zi|U)]] 

=  -Eux2[X2(E(XiIU)  -  E(XilU))] 

=  0, 

and  (119)  is  from 

£[(Z2  -  £(Z2|t/))(£(Z2|U)  -  £(Zi|[/))] 

=  £[(Z2-£(Z2|C/))£(Z2|C/)] 
-£[(Z2-£(Z2|U))£(Zi|f/)] 
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